Automobiles & Trucks
UrbanIng-V2X: ALarge-Scale Multi-Vehicle, Multi-Infrastructure Dataset Across Multiple Intersections for Cooperative Perception
Recent cooperative perception datasets have played a crucial role in advancing smart mobility applications by enabling information exchange between intelligent agents, helping to overcome challenges such as occlusions and improving overall scene understanding. While some existing real-world datasets incorporate both vehicle-to-vehicle and vehicle-to-infrastructure interactions, they are typically limited to a single intersection or a single vehicle. A comprehensive perception dataset featuring multiple connected vehicles and infrastructure sensors across several intersections remains unavailable, limiting the benchmarking of algorithms in diverse traffic environments. Consequently, overfitting can occur, and models may demonstrate misleadingly high performance due to similar intersection layouts and traffic participant behavior. To address this gap, we introduce UrbanIng-V2X, the first large-scale, multi-modal dataset supporting cooperative perception involving vehicles and infrastructure sensors deployed across three urban intersections in Ingolstadt, Germany. UrbanIng-V2X consists of 34 temporally aligned and spatially calibrated sensor sequences, each lasting 20 seconds. All sequences contain recordings from one of three intersections, involving two vehicles and up to three infrastructure-mounted sensor poles operating in coordinated scenarios. In total, UrbanIng-V2X provides data from 12 vehicle-mounted RGB cameras, 2 vehicle LiDARs, 17 infrastructure thermal cameras, and 12 infrastructure LiDARs. All sequences are annotated at a frequency of 10 Hz with 3D bounding boxes spanning 13 object classes, resulting in approximately 712k annotated instances across the dataset.
OpenAD: Open-World Autonomous Driving Benchmark for 3DObject Detection
Open-world perception aims to develop a model adaptable to novel domains and various sensor configurations and can understand uncommon objects and corner cases. However, current research lacks sufficiently comprehensive open-world 3D perception benchmarks and robust generalizable methodologies. This paper introduces OpenAD, the first real open-world autonomous driving benchmark for 3D object detection. OpenAD is built upon a corner case discovery and annotation pipeline that integrates with a multimodal large language model (MLLM). The proposed pipeline annotates corner case objects in a unified format for five autonomous driving perception datasets with 2000 scenarios. In addition, we devise evaluation methodologies and evaluate various open-world and specialized 2D and 3D models. Moreover, we propose a vision-centric 3D open-world object detection baseline and further introduce an ensemble method by fusing general and specialized models to address the issue of lower precision in existing open-world methods for the OpenAD benchmark.
SDPGO: Efficient Self-Distillation Training Meets Proximal Gradient Optimization
Self-knowledge distillation (SKD) enables single-model training by distilling knowledge from the model's own output, eliminating the need for a separate teacher network required in conventional distillation methods. However, current SKD methods focus mainly on replicating common features in the student model, neglecting the extraction of key features that significantly enhance student learning. Inspired by this, we devise a self-knowledge distillation framework entitled Self-Distillation training via Proximal Gradient Optimization or SDPGO, which utilizes gradient information to identify and assign greater weight to features that significantly impact classification performance, enabling the network to learn the most relevant features during training. Specifically, the proposed framework refines the gradient information into a dynamically changing weighting factor to evaluate the distillation knowledge via the dynamic weight adjustment scheme. Meanwhile, we devise the sequential iterative learning module to dynamically optimize knowledge transfer by leveraging historical predictions and real-time gradients, stabilizing training through mini-batch-based KL divergence refinement while adaptively prioritizing task-critical features for efficient self-distillation. Comprehensive experiments on image classification, object detection, and semantic segmentation demonstrate that our method consistently surpasses recent state-of-the-art knowledge distillation techniques.
HCRMP: ALLM-Hinted Contextual Reinforcement Learning Framework for Autonomous Driving
Integrating the understanding and reasoning capabilities of Large Language Models (LLM) with the self-learning capabilities of Reinforcement Learning (RL) enables more reliable driving performance under complex driving conditions. There has been a lot of work exploring LLM-Dominated RL methods in the field of autonomous driving motion planning. These methods, which utilize LLM to directly generate policies or provide decisive instructions during policy learning of RL agent, are centrally characterized by an over-reliance on LLM outputs. However, LLM outputs are susceptible to hallucinations. Evaluations show that state-of-theart LLM indicates a non-hallucination rate of only approximately 57.95% when assessed on essential driving-related tasks. Thus, in these methods, hallucinations from the LLM can directly jeopardize the performance of driving policies.
Flexible Controllability Generation and Reconstruction with High Fidelity Semantic) (Vector / BEV Layout Map Render Depth RGB Camera Render3D Box &-Scene: World GeneratorLow-Level Control
Diffusion models are advancing autonomous driving by enabling realistic data synthesis, predictive end-to-end planning, and closed-loop simulation, with a primary focus on temporally consistent generation. However, large-scale 3D scene generation requiring spatial coherence remains underexplored. In this paper, we present X-Scene, a novel framework for large-scale driving scene generation that achieves geometric intricacy, appearance fidelity, and flexible controllability. Specifically, X-Scene supports multi-granular control, including low-level layout conditioning driven by user input or text for detailed scene composition, and high-level semantic guidance informed by user intent and LLM-enriched prompts for efficient customization. To enhance geometric and visual fidelity, we introduce a unified pipeline that sequentially generates 3D semantic occupancy and corresponding multi-view images and videos, ensuring alignment and temporal consistency across modalities. We further extend local regions into large-scale scenes via consistencyaware outpainting, which extrapolates occupancy and images from previously generated areas to maintain spatial and visual coherence. The resulting scenes are lifted into high-quality 3DGS representations, supporting diverse applications such as simulation and scene exploration. Extensive experiments demonstrate that X-Scene substantially advances controllability and fidelity in large-scale scene generation, empowering data generation and simulation for autonomous driving.
AGC-Drive: ALarge-Scale Dataset for Real-World Aerial-Ground Collaboration in Driving Scenarios
By sharing information across multiple agents, collaborative perception helps autonomous vehicles mitigate occlusions and improve overall perception accuracy. While most previous work focus on vehicle-to-vehicle and vehicle-to-infrastructure collaboration, with limited attention to aerial perspectives provided by UAVs, which uniquely offer dynamic, top-down views to alleviate occlusions and monitor large-scale interactive environments. A major reason for this is the lack of highquality datasets for aerial-ground collaborative scenarios. To bridge this gap, we present AGC-Drive, the first large-scale real-world dataset for Aerial-Ground Cooperative 3D perception. The data collection platform consists of two vehicles, each equipped with five cameras and one LiDAR sensor, and one UAV carrying a forward-facing camera and a LiDAR sensor, enabling comprehensive multi-view and multi-agent perception.
An Industrial Grade for Vehicle Aerodynamic Optimization
Vehicle aerodynamics optimization has become critical for automotive electrification, where drag reduction directly determines electric vehicle range and energy efficiency. Traditional approaches face an intractable trade-off: computationally expensive Computational Fluid Dynamics (CFD) simulations requiring weeks per design iteration, or simplified models that sacrifice production-grade accuracy. While machine learning offers transformative potential, existing datasets exhibit fundamental limitations--inadequate mesh resolution, missing vehicle components, and validation errors exceeding 5%--preventing deployment in industrial workflows. We present DrivAerStar, comprising 12,000 industrial-grade automotive CFD simulations generated using STAR-CCM+ software. The dataset systematically explores three vehicle configurations through 20 Computer Aided Design (CAD) parameters via Free Form Deformation (FFD) algorithms, including complete engine compartments and cooling systems with realistic internal airflow. DrivAerStar achieves wind tunnel validation accuracy below 1.04%-- a five-fold improvement over existing datasets--through refined mesh strategies with strict wall y` control. Benchmarks demonstrate that models trained on this data achieve production-ready accuracy while reducing computational costs from weeks to minutes. This represents the first dataset bridging academic machine learning research and industrial CFD practice, establishing a new standard for data-driven aerodynamic optimization in automotive development. Beyond automotive applications, DrivAerStardemonstrates a paradigm for integrating highfidelity physics simulations with Artificial Intelligence (AI) across engineering disciplines where computational constraints currently limit innovation.
2028 Mercedes-Benz VLE first drive: Your 8K living room on wheels has arrived
Benz's electric Grand Limousine might just make minivans cool. The concept of a living room on wheels is something of a modern clichรฉ in the automotive world, a vision for a car so comfortable, well-appointed and ultimately luxurious that you'd be just as happy to spend hours there as you would lounging at home. The problem is that most of those concepts, like the Cadillac InnerSpace or Mini Urbanaut, have depended on the availability of self-driving technology, something that still only exists in the limited circles of Waymo, Zoox and their ilk. We're still years away from you or I being able to buy a car that can drive itself unsupervised, but that isn't stopping Mercedes from releasing what could be the most compelling of the rolling living spaces. It's called the VLE, and while it requires a human behind the wheel, passengers in the second row will be treated to reclining, massaging seats, a 22-speaker Dolby Atmos sound system and a 31.3-inch
RayFusion: Ray Fusion Enhanced Collaborative Visual Perception
Collaborative visual perception methods have gained widespread attention in the autonomous driving community in recent years due to their ability to address sensor limitation problems. However, the absence of explicit depth information often makes it difficult for camera-based perception systems, e.g., 3D object detection, to generate accurate predictions. To alleviate the ambiguity in depth estimation, we propose RayFusion, a ray-based fusion method for collaborative visual perception. Using ray occupancy information from collaborators, RayFusion reduces redundancy and false positive predictions along camera rays, enhancing the detection performance of purely camera-based collaborative perception systems. Comprehensive experiments show that our method consistently outperforms existing stateof-the-art models, substantially advancing the performance of collaborative visual perception.
Availability-aware Sensor Fusion via Unified Canonical Space
Sensor fusion of camera, LiDAR, and 4-dimensional (4D) Radar has brought a significant performance improvement in autonomous driving. However, there still exist fundamental challenges: deeply coupled fusion methods assume continuous sensor availability, making them vulnerable to sensor degradation and failure, whereas sensor-wise cross-attention fusion methods struggle with computational cost and unified feature representation. This paper presents availability-aware sensor fusion (ASF), a novel method that employs unified canonical projection (UCP) to enable consistency in all sensor features for fusion and cross-attention across sensors along patches (CASAP) to enhance robustness of sensor fusion against sensor degradation and failure. As a result, the proposed ASF shows a superior object detection performance to the existing state-of-the-art fusion methods under various weather and sensor degradation (or failure) conditions. Extensive experiments on the K-Radar dataset demonstrate that ASF achieves improvements of 9.7% in APBEV (87.2%) and 20.1% in AP3D (73.6%) in object detection at IoU=0.5, while requiring a low computational cost.